Neural Voice Cloning with a Few Samples

نویسندگان

  • Sercan Ömer Arik
  • Jitong Chen
  • Kainan Peng
  • Wei Ping
  • Yanqi Zhou
چکیده

Voice cloning is a highly desired feature for personalized speech interfaces. Neural network based speech synthesis has been shown to generate high quality speech for a large number of speakers. In this paper, we introduce a neural voice cloning system that takes a few audio samples as input. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios and to be used with a multi-speaker generative model. In terms of naturalness of the speech and its similarity to original speaker, both approaches can achieve good performance, even with very few cloning audios. 1 While speaker adaptation can achieve better naturalness and similarity, the cloning time or required memory for the speaker encoding approach is significantly less, making it favorable for low-resource deployment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Artificial Neural Networks and Support Vector Machines for Parkinson Disease Detection using Human Voice

Artificial neural network(ANN) with tansig, logsig and purelin transfer function, support vector machines(SVM), linear and quadratic classifiers are used in this work for the detection of Parkinson disease using voice features. In the Parkinson disease, voice of a person changes because of presence of tremor in the voicebox muscles. Total 195 phonations were used for the analysis from twenty th...

متن کامل

Text-Dependent Speaker Verification System Using Neural Network

This paper presents the use of back propagation neural network to implement voice recognition. The focus is to identify voice patterns of different people so as to recognize their voices electronically. The signals corresponding to a text phrase of a group of people are recorded in voice files on a computer using sound recording software. The information in these files is converted from time do...

متن کامل

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source an...

متن کامل

Tone Quality Improvement of Bone Conduction Voice by Cepstrum-based Local Conversion Models

A novel tone quality improvement method for a bone conduction voice is presented. In the present method, the tone quality of the bone conduction voice is converted to the similar quality of the air conduction voice. For the voice conversion, the present method uses a codebook, which consists of various paired code vectors of the bone and air conduction voices. The deltaand mel-cepstral coeffici...

متن کامل

بررسی جرم انگاری شبیه‌سازی انسان در حقوق ایران

Cloning, as a new technology, has attracted the attention of statesmen, physicians, lawyers and other scientific communities. This phenomenon both opens a new horizon on human society regarding its therapeutic features and brings some concerns to it. This technology is divided into two parts: Human or generative cloning and therapeutic or investigative cloning. The first meaning which comes to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.06006  شماره 

صفحات  -

تاریخ انتشار 2018